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1. Introduction 

Frechet means, or Riemannian centres of mass, were introduced at a relatively 
early stage of probability by Frechet (1948). The idea is simple enough: general- 
ize the mean-square characterization of the mean E [X] as the minimizcr of the 
"energy function" x i— i E [(X - xf] . If X takes values in a metric space X 
this can be achieved as follows: replace {X — x)"^ by the square of the distance 
function dist(X, x)^. 

Of course the theory of Frechet means is subject to geometric complications. 
Uniqueness becomes the exception rather than the rule, though existence is 
guaranteed if the metric space satisfies some kind of local compactness condition. 
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Ziezold (1977, 1989, 1994) established some basic results in this broad context, 
as well as developing some significant applications in applied statistics. If the 
metric space X is specialized to a Riemannian manifold M then it is possible 
to produce useful calculations and estimates using curvature; Karcher (1977) 
provides a good account of this as well as surveying substantial applications of 
Frechet means in geometry. 

Probabilistic interest in Frechet means was initially spurred on by considera- 
tions of how to generate theories of martingales taking values in manifolds, and 
in particular how then to extend the mathematical application of martingale 
theory beyond the theory of linear elliptic differential equations to the theory 
of harmonic maps (Kendall, 1990; Picard, 1994). In particular this led to strong 
connections with convexity theory for Riemannian manifolds, simply expressed 
in Kendall (1991b) and further developed in Kendall (1991a, 1992a, 1992b) and 
Corcuera and Kendall (1999); more recently see Afsari (2011). Ziezold (1989)'s 
application of Frechet means to statistical shape theory has been taken up 
by several workers (see for example Le, 2001, 2004; also the recent survey 
by Kendall and Le, 2010). In particular Bhattacharya and Patrangenaru (2003, 
2005) and Bhattacharya and Bhattacharya (2008) have developed important 
statistical theory for empirical Frechet means on Riemannian manifolds, in- 
cluding (but not limited to) laws of large numbers and central limit theory for 
independent and identically distributed manifold- valued random variables. 

The present paper is inspired by these results of Bhattacharya and co-workers, 
and addresses the challenge of extending their theory to the non-identically dis- 
tributed case. After Section 2, which establishes basic definitions and notation, 
in Section 3 we develop a weak law of large numbers for empirical Frechet means 
in a metric space context (Theorem 2) which is based on the most general pos- 
sible weak law of large numbers for independent non- negative random variables 
(stated here as Theorem 1). In particular, we pay attention to the question of 
when one can assert existence of local empirical Frechet means lying close to a 
local minimizer of the aggregated energy function which is obtained by summing 
the individual energy functions of the random variables concerned. 

It is a natural step from this theory to consider central limit theorems of Lin- 
deberg type for empirical Frechet means, since the conditions for the weak law 
of large numbers (Theorem 2) involve conditions of Lindeberg type. To do this 
one needs to specialize to the more specific case of Riemannian manifolds, since 
this allows one to use the Riemannian Exponential map to refer the manifold to 
an Euclidean approximation. It is therefore apparent that a central limit theo- 
rem for the Riemannian manifold case must depend on a central limit theorem 
for the random tangent vectors corresponding to the manifold-valued random 
variables via this Exponential map, and Section 4 considers the relevant theory. 

In fact there is a substantial literature on central limit theorems and normal 
approximations for vector- valued random variables; see Bhattacharya and Rao 
(1976) for an exposition in book form, and more recently Chatterjee (2008) and 
Rollin (2011) (both of whom describe approaches which apply Stein's method). 
However, as we sought to generalize to a Lindeberg central limit theorem for 
empirical Frechet means so it became clear that we needed a subtly different 
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result; a theorem which would describe when a sequence of normalized random 
sums may be approximated by a second sequence of matching multivariate nor- 
mal random variables, when there is no guarantee of weak convergence, and 
when the normalization uses not individual coordinate variances but the trace of 
the variance- covariance matrix of the sum. These requirements mean, for exam- 
ple, that one cannot simply apply the Cramer- Wold device. The closest general 
result we can find in the published literature is that of Bhattacharya and Rao 
(1976, Corollary 18.2) (also see Barbour and Gnedin, 2009, for specific cases 
arising in study of infinite occupancy schemes) ; however this uses normalization 
in a matrix- valued sense, using the inverse of the symmetric square-root of the 
variance-covariance matrix (which is required to be non-singular), whereas we 
need an approach which uses scalar normalization and which can work even 
when the variance-covariance matrix degenerates. 

It turns out, as we describe in Section 4, that it is possible to formulate such 
a result, a multidimensional Lindeberg central approximation theorem, which 
we state and prove as Theorem 3 (and also Corollary 2 for the Feller converse). 
Proofs vary little from the classic approach of, say. Feller (1966). However it is 
necessary to take account of the vector-valued context and to allow for a crucial 
intervention of the Wasserstein metric for the truncated Euclidean distance; 
therefore we give the proofs in full for the sake of completeness of exposition, 
since the application is unfamiliar. 

These results allow us to prove a Lindeberg central approximation theorem 
for empirical Frechet means, which forms Theorem 4 in Section 5. The ba- 
sic idea uses Newton's root-finding algorithm, and owes much to the work of 
Bhattacharya and co-workers; however while extending to the non-identically 
distributed case we are also able to clarify the set of conditions required for 
the result, by exploiting the idea of central approximation rather than central 
limits, and we can derive a rather explicit form for the variance-covariance ma- 
trices of the approximating multivariate normal random variables. The paper 
concludes with a small number of illustrative examples, demonstrating how the 
results simplify in the case of independent and identically distributed random 
variables, and also in the case when the Riemannian manifold is of constant 
sectional curvature, or carries a Kahler structure with constant holomorphic 
sectional curvature. 
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2. Basic theory and notation 



Consider the energy function of a random variable X taking values in a metric 
space X: 

'1 



■ dist(X,.T) 
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Observe that if (f> is finite at one point of X then it is finite everywhere, by 
an argument using the triangle inequahty. Given independent Xi, . . . , X„, the 
aggregate energy function is simply the sum 



A Frechet mean is a global minimizer of (j). Note that there can be more than 
one Frechet mean: we then consider the set of Frechet means 



arg mm 1 



idist(X,a;)2 



An empirical Frechet mean is a global minimizer of the energy function based 
on the empirical probability measure defined by a sample Xi, . . . , X„: thus the 
set of empirical Frechet means is 



1 " 1 
arg min — > — distfATj , ; 



2 = 1 

(In case of local compactness, the existence of global minimizers of both kinds 
follows immediately from dist(X, y) + dist(X, x) > dist(a::, y).) 

Some of our results hold for local minimizers; we use the term local Frechet 
mean to describe a local minimizer of 0, while a local empirical Frechet mean 
denotes a local minimizer of the energy function based on the empirical proba- 
bility measure defined by a sample Xi , . . . , X„ of points from the metric space 
X. 

We shall use the operator-theoretic notation E [H] to denote the expectation 
of a random variable H. In particular we shall write E,[H ; A] = K[H1[A]], 
where I [A] is the indicator random variable for an event A. 



3. Weak law of large numbers for empirical Frechet means 

Ziezold (1977) established a strong law of large numbers for sequences of in- 
dependent identically distributed random variables Xi , X2 , ■ ■ ■ taking values in 
a separable metric space X (actually Ziezold covered the more general case of 
a separable finite quasi-metric space). Imposing the condition that the energy 
function E dist(Xi,x)^] be finite for some (and thus all) x, Ziezold was then 
able to show that almost surely the limit of the closure of the sup of the set of 
empirical Frechet means is a subset of the set of Frechet means (up to an event 
of zero probability measure): 



-dist(Xi,x)2 . (1) 

Here of course the arg min are treated as random closed sets. 



1^ (J arg min — - dist(X,; , x)^ C arg min E 
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If X is not compact then it is possible for a sequence of empirical Frechet 
means to diverge to infinity even when (1) holds. Given uniqueness of the Frechet 
mean, Bhattacharya and Patrangenaru (2003, Theorem 2.3) have shown that a 
strong law of large numbers follows from imposition of the additional condition 
that every closed bounded subset of X is compact; in that case every sequence 
of measurable choices from the sets 



of empirical Frechet means will almost surely converge to the unique Frechet 
mean. 

In this section we derive a weak law of large numbers in the more general 
case of non-identically distributed independent random variables Xi, X2, . . . , 
taking values in a separable metric space X possessing the bounded compact- 
ness property of Bhattacharya and Patrangenaru, and such that the individual 
energy functions E [i dist(X„, x)^] are finite for some (and therefore for all) 
X £ X. Evidently we need to impose extra conditions to compensate for the 
lack of identical distribution; we will require that the aggregate energy function 
(f>n{x) = X]"=i ^ [5 dist(Xj, x)^] has a strict local minimum near a fixed refer- 
ence point o € X, and we will require that this holds uniformly as n — >■ 00 (in a 
particular sense captured in the displayed equation (2) below). In recompense 
for this restriction, our results describe the behaviour of local empirical Frechet 
means lying in a geodesic ball ball(o, pi) C X. The particular uniformity re- 
quirement is that for each positive pa < pi there is positive k = k(pq,pi) such 
that, for all n, 



Bearing in mind that the ultimate aim of this paper is to prove a central limit 
theorem, convergence in probability is a more natural objective than almost sure 
convergence. Therefore it is reasonable to restrict attention to the weaker notion 
of convergence in probability. Moreover even in the scalar case the law-of-large- 
numbers conditions for convergence in probability are clearer and more easily 
stated than for convergence almost surely. The key theorem for our treatment is 
the weak law of large numbers for non-identically distributed non-negative real 
random variables. We state a special case of this result: 

Theorem 1. Suppose that Z\, Z2, ... are independent non-negative real random 
variables, not necessarily of the same distribution. Suppose further that 




i=l 



(1 + k)(/)„(o) < inf {0„(y) : po < dist(y,o) < pi} . 



(2) 








for each e > . 



(3) 



Then it is the case that as n —> 00 so 



1 



in probability . 



(4) 
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This theorem foUows directly from Chow and Teicher (2003, Chapter 10, The- 
orem 1, Corollary 2). 

Remark 1. The condition (3) can be viewed as an equation of Lindeberg type. 
Indeed, if Wi, W2, ■ ■ ■ are independent real random variables with E [Wm] = 
and such that = Zm, then (3) corresponds exactly to the usual Lindeberg 
condition for the sequence {Wm : m > 1}. Thus Chow and Teicher (2003, Chap- 
ter 10, Theorem 1, Corollary 2) signals the close connection between weak laws 
of large numbers and the central limit theorem. 

Our strategy for proving a weak law of large numbers for non-idcntically 
distributed (Y-valued random variables is as follows: consider the condition (3) 
applied to the case Zm'^ = \d\st{Xm,xY, and then apply the corresponding 
weak laws of large numbers (4). Under suitable additional conditions the ag- 
gregate empirical energy functions X]m=i = Z]m=i 1 dist(Xm,a;)^ can be 
made to approximate the aggregate energy functions 4>n{x) closely enough to 
ensure that the uniform local minimum property forces convergence to 1 of the 
probability of there being local empirical Frechet means close to o. 

For a useful result it is preferable to require that the Lindeberg-type condi- 
tion apply only at the chosen reference point o. For a general metric space X we 
should not expect the Lindeberg-type condition for the Zm^ to imply the corre- 
sponding conditions obtained when o is replaced by a general x € X. However 
we can prove a partial result in this direction, which will be sufficient for our 
purposes: 

Lemma 1. Suppose as above that X is a separable metric space. Let Xi, X2, 
. . . be independent X -valued random variables with finite energy functions. The 
following conditions of Lindeberg-type are equivalent: 





(5) 

c) ^ 

(6) 

Remark 2. Note that the presence of (j)n{x) in (6) means that this semi-global 
condition is not truly global, since (f)n{x) = X]m=i [5 dist(X„i, a;)^] depends 
implicitly on the choice of x € X . 

Proof. First suppose that the local condition (5) holds. We shall use this to 



Firstly, a local Lindeberg condition: 



1 " 



i dist(X„,a;)^ ; i dist(X„,, x)^ > e(j)n{x) 



771—1 

as n — > 00 for each e > . 
Secondly^ a semi-global Lindeberg condition: 

n 71 



^ VV] 



1 



-dist{X„XjY ■ -dist{X,,XjY > £0„(a 



as n —> oo for each e > . 
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produce an upper bound on the quantity on the left-hand side of (6). Indeed 



_j n n 

V Ve 



dmt{X„Xjf ; -dist(X„Xj)' > e^x) 



< 



lie 
-dist{Xi,xf ; -dist(Xi,a;)^ > 



■El 



1 £ 
-dist(X,,x)2 > -(Pn{x) 



Here we make direct use of the triangle inequality via 

dist(X,,Xj)^ < 2dist(X,,.T)2 + 2dist(Xj,a;)2 



in particular the condition that dist(Xi, Xj) > y^2e(f)n{x) implies that at least 
one of dist(Xi, x) > \yj2e(j)n[x) or dist(Xj, x) > \^j2e(j)n{x) must hold. 
The Markov inequality implies that 



■E' 

i=l 



1 £ 

-dist(X„x)2 > -0„(x) 



16 



■E^ 



en(j)n{x) ^ 
and therefore we obtain 
'1 



< 



lie 
-dist(Xi,a:)^ ; - dist(Xi, cc)^ > -0„(a;) 



2 dist(X„X,y ; -dist(X„Xj)" > e0„(x) 



< 



4'i + -)tItEi 

en J (j>n{x) ^ 



lie 
-dist(X,,x)2 ; -dist(X,,x)2 > -0„(2:) 



For any e > this upper bound tends to zero as n — cx), by (5), and therefore 
we obtain (6). 

Now suppose on the other hand that the semi-global condition (6) holds. 
If dist(Xi,a;) > ^y2e(f>n{x) and dist{Xj,x) < ^•\/2£0„(a;) then it follows that 
dist(Xj,Xj) > i ■y2e(/)„ (cc) . We deduce that 



E 



idist(X„X,)2 ; ^dist{X„ Xjf > y„{x) 

1 



> 



-dist{X,,Xj)^ ; dist(X,,a;) > A/2e(/)„(a;), dist(Xj, cc) < -y^2£^fjx) 
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If dist{Xi,x) > y/2e(l)n{x) and dist{Xj,x) < ^^/2e(j)n{x) then dist{Xi,Xj) > 
dist(Xi, x) — dist{Xj,x) > \yj2e4>n{x) > i dist(Xj, x), and so 



n0n\x 



^ n n 



i=i j=i 
1 " 

4(/.„(x) ^ 



> 



i dist(X,, x)^ ; i dist(Xj, a;)^ > e(j)n{x) 



1 " 

- E 



Finally wc take complements and use Markov's inequality to deduce 



1 " 
n ^ 



idist(Xj,x)2 < 



1 4 ^ 

> 1 7 IE 



;dist(Xj,x)^ ; -dist(Xj,x)^ > -(/)„(x) 



14 1 , 4^ 

> 1 > - once n > 2(1 + ^) , 

n ne 2 

Taking n > 2(1 + -), we deduce that (6) implies (5) by arguing that 



-j^ n n 

n</.„(x) 



idist(X,;,Xj)2 ; ldist(X„Xj)' > ^-^M^) 



> 



1 1 



■E^ 



idist(X,,a;)^ ; ^ dist(Xj, a;)^ > e(j)n{x) 



8 4)n{x) ^ 

This establishes the equivalence of local and semi-global conditions 



□ 



Effective use of the semi-global Lindeberg condition depends on a lower bound 
on the growth of the energy function 4>n{y) as dist(o,y) increases. 

Lemma 2. Suppose as above that X is a separable metric space. Let Xi, X2, 
. . . be X -valued random variables with finite energy functions. Suppose that the 
aggregate energy function <f)n{y) = X]m=i -"^ [5 "^^^^(^"i' y)^] o^^'^*'^* "Us mini- 
mum over X at y = o: 

Mo) < My)- (7) 

Then the aggregate energy function grows at least linearly at any y 7^ o: 



, / ^ ^ dist(y, o)2 

My) > 7^ — n. 

16 



(8) 
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Proof. For convenience, set p = dist(y,o). If 0„(o) > p^n/16 then (8) follows 
from inequality (7). So we can suppose that (f>n{o) < p^n/16. 

For additional convenience let M be a random integer chosen uniformly from 
{1,2,..., n} (independently oi Xi, . . . , X„). Then 



1 

— ( 

n 



E 



■ dist (XM,y)^ 



dist(XM,y) > ^ 



> 



> 



8 



1 



■ dist{X M,yf 



> 



P 



2 ' - 8 

(Markov inequality) 



dist(X 



M,o) < ^ 



(triangle inequality) 



;dist(XA/,0)2 > 



> 



1 



8 



E 



; dist(XA/,o) 



> 



El 

8 



92 16 



> 



(Markov inequality again) 
16 ■ 



So (8) follows in this case also. 



□ 



We are now in a position to state and prove the main result of this section. 
We follow Bhattacharya and Patrangenaru (2003) by imposing the compactness 
of bounded closed sets, and also impose the uniform local minimum property 
described above by Inequality (2). 

Theorem 2. Suppose X is a separable metric space for which all bounded closed 
sets are compact. Let Xi, X2, . . .be independent non-identically distributed X- 
valued random variables such that E [i dist(Xm, o)^] <, 00 for a given reference 
point o Cz X (hence for all points in X), for each m. Suppose also that the 
uniform local minimum property obtains: there is fixed finite pi > such that 
Inequality (2) holds for each positive po ^ Pi- Thus there is n = k{pq,pi) such 
that (1 + k)0„(o) (for the aggregate energy function cj)n specified above) is a strict 
lower bound for the values of (j)n on the annulus centred at o and defined by radii 
po, pi. Finally, suppose that the X,„ satisfy a local condition of Lindeberg type 
at o: for each e > 0, as 71 —> 00 so 



1 



E 



E [dist(X„, of ■ dist(X,„, of > e0„(o)] 0. 



Consider any measurable choice of a sequence of local minimizers 

1 



(9) 



£ {Xi, . . . , Xn) 



arg inf 

a;(^ball(o,pi) 




dist{Xm,x) 



There exists at least one such sequence such that 

P[£(Xi,...,X„) eball(o,po)] ^ 1, 
and for any such sequence £{Xi, . . . , Xn) o in probability. 
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Proof. First note that global (and hence also local) minimizers of the aggregate 
empirical energy function always exist and are confined to an almost surely 
bounded region: indeed global minimizers for the sample Xi , . . . , Xn are sim- 
ply conventional Frechet means of the n-point empirical distribution, and the 
argument of Bhattacharya and Patrangenaru (2003, Theorem 2.1) applies (this 
theorem is stated for Riemannian manifolds, but the portion relating to exis- 
tence within a bounded region is a purely metric space argument, using the 
compactness of bounded sets) . 

Evidently it suffices to show that Inequality (2) has high probability of be- 
ing replicated at the empirical level: it is enough to show that the following 
probability converges to 1 as ti — > oo for each positive po < pi : 



" " 1 f " 1 

2 dist(X,„, of < inf <^ ^ - dist(X™, y)^ 

_m—l \rii—l 



Po < dist(?;, o) < pi 



(10) 

For then it follows immediately that any sequence of local minimizers of the 
aggregate empirical energy function restricted to ball(o, pi) 



£{Xi, . . . , Xn) 



arginf \ Y] ^ dist(X™, a;)^ i 

a;ebaU(o,pi) ^ J 



must (as n — >■ oo) eventually have arbitrarily high probability of lying in ball(o, po), 
and must in this event be a local minimizer of the unrestricted aggregate empir- 
ical energy function. Since (10) holds for each positive po < pi, we may deduce 
that dist(£(Xi, . . . , Xn), o) — >• in probability. 

To begin the proof, first note that the result follows trivially if (f>n (o) = for 
all n, for then X„i = o almost surely for all n. Otherwise by Theorem 1 



1 " 1 
0„(o) ^ 2 



1 in probability. 



(11) 



Furthermore Lemma 1 and (9) show that, for each e > 0, as n — > oo so 
1 



idist(X„Xj)2 ; idist(X„X,)2 > £(j)n{o) 



rt(/)„(o) ^ ^ 
Moreover (2) implies that if po < dist(?;, o) < pi then also 



n n 

^ VV] 



idist(X„Xj)2 ; idist(X„Xj)2 > £cj)n{y) 



0. 



A further application of Lemma 1 then shows that, for each e > 0, as n — >■ oo 

^ 0. 



1 



idist(X„,y)2 ; idist(X,„,y)2 > e0„(y) 
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Consequently we may also deduce that if po < dist(y,o) < pi then 



1 



^ -dist(X„,2/)2 ^ 1 



in probability. 



(t>n{y) 



m— 1 



Now we have established suitable convergence in probability for the energy 
functions, but only holding pointwise not uniformly. Were we able to uniformize 
this over the whole of the annulus A{po,pi) = {y ■ po < dist(2/,o) < pi}, 
and were we able to overcome the distinction between 4>nio) and for 
y G A{po,pi), then we would achieve the required convergence for (10) via 
Inequality (2). Following Bhattacharya and Patrangenaru (2003), we do this 
by selecting yi, . . . , yk from A{pQ,pi) to form a finite 5-net for A{pQ,pi), for 
suitably small S > 0. Consider two points y, z € A(pQ,pi) with dist(y,z) < i5. 
Then we can use dist(Xm, y) < 1 + dist(J'("m, y)^ to deduce 



dist(X„,z)2 < (dist(X™,j/) + <S)2 < {l + 2S)dist{X„„y)^ + {2 + S)S, 



Applying this to whichever is the larger of dist(X„i, y)^, dist(Xm, z)^, and then 
using dist(X,„, z)^ < (dist(X^, y) + (5)2 < 2 dist(X™, y)^ + 2(5^ 

|dist(X,„,z)2-dist(X„„y)2| < 

2(5max{dist(X„,z)2,dist(X„,y)2} + {2 + S)S < 
46 dist{Xm,yf +4,6^ + {2 + 6)6 = {4:dist{Xm,y)^ + 46^ + 6 + 2) 6 . 

For z <S A{po, pi), choose p{z) to be an element of the (5- net which is closest to 
z. Then the above implies that 



Thus we establish useful limiting bounds holding in probability as n — > oo so 
long as we can show that if y G A(po, pi) then 



likewise 



di&t{Xm,yf < {l + 26)Aist{Xra,zf + {2 + 5)5. 




lim inf 



Kiy) 



> 0. 



n 



n 
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But this follows (with an explicit lower bound) from Lemma 2: hence 

zeA{po,pi)\ < 



sup 



f 1 " 1 

|i-^::Ri))S2'i^^^(^™'^)^ 
f 1 " 1 



n ^ 

-dist{X,n,y^f 



46^ + 6 + 2 16 



Consequently, once po is fixed, for any e > we can choose 6 small enough so 
that with probability tending to 1 as n — )■ oo 



sup 



f 1 " 1 



; z G ^(po,pi) 



< 



We now use (2) to deduce that with probability tending to 1 as — ?• oo 
^'^i\^^'^^st{X^,zf ; zeA{po,pi)\ > 

[m=l J 

(1 + k)0„(o) inf J / V ^ dist(X™, z)2 ; z € A{po, Pi) \ > 
[0„(p(z))^^2 j 

> (l-|)(l + K)0„(o) > (l-e)(l + K)^ -dist(X™,o)2 

?n— 1 

where the last step uses the convergence in probability noted in (11). This 
establishes that the quantity in (10) must converge to 1; this completes the 
proof of the theorem. □ 

We have therefore shown that sequences of local empirical Frechet means 
must converge in probability to a reference point o when this reference point is 
uniformly a strict local minimum of the aggregate energy function so long as a 
condition of Lindeberg-type is satisfied at o. Under the additional condition of a 
linear bound on the growth of 0,i(o) it is possible also to control the behaviour 
of global minimizers and derive a result for global empirical Frechet means. 

Corollary 1. In the situation of Theorem 2, suppose that condition (2) holds 
for all positive pi (thus in particular o is the unique global Frechet mean), and 
suppose in addition that there is a positive constant C such that 



1 1 " 

lim sup — (j)n ( o) ~ lim sup — y E 



-dist(X™, of 



< C 



(12) 



Then any measurably selected sequence of local empirical Frechet means con- 
verges to a in probability. 
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Proof. Following the proof of Theorem 2, it would suffice to show that, for 
sufficiently large pi, 



f f f f 

~ XI 9 dist(^m, o)2 + f < inf <^ - ^ - dist{X„t,yf : dist(y, o) > pi 



n 2 

m—1 



n ^ 2 

m—l 



converges to f as n — )■ oo. To establish this, we once again adapt methods from 
the proof of Bhattacharya and Patrangenaru (2003, Theorem 2.3). First observe 
that we can apply the Cauchy- Schwartz inequality to show that 

f " f f " f 

- dist(X^, > - ^ ^ (dist(X^, o) - dist(y, o))' > 



n ^ 2 

m—l 



71 ^ 2 

m—l 



f f " f 

- dist(j/, o)2 + - V - dist(X™, o)2 - V2 dist(j/, 



n ^ 2 

m—l 



" 1 

^ -dist(X„,0)2 



n ^ 2 

m—l 



As before, if 0„ (o) = for all n then the Corollary follows immediately. Other- 
wise from Theorem f and the local Lindeberg condition we know that 

1 " f 

- — — — y — dist(X„i, o)^ — > 1 in probability , 

<Pn{0) 2 
^ ' rn—1 

and hence the growth condition (12) shows that as n — > oo so (for example) 
- n 

-^-dist(X™,o)2<2C2 

m—l 



This can be applied as follows; if we choose pi to exceed 2C + V2 + 4C^ then, 
with probability increasing to 1 as n — >■ oo. 



i dist(y, o)^ - \/2 dist(?/, o) 



\ 



1 1 

^ -dist(X„,o)2-l > 0. 



71 ^ 2 

m—l 



once dist(7/, o) > pi. Consequently as ti — ?> oo so 



1 " 1 1 " 1 

~ 51 9 dist(X,„, o)2 + 1 < inf <^ - ^ - dist{Xrn,yf ■■ dist(y, o) > pi 



71 ^ 2 

m—l 



71 ^ 2 

m—l 



converges to 1 as required. 



□ 



4. Euclidean interlude 



Before we turn to the central limit theorem on Riemannian manifolds, it is help- 
ful to prove a modest variant on the usual central limit theorem for independent 
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Euclidean (vector-valued) random variables, which may be of independent in- 
terest, and which could be argued to capture more precisely the conventional 
statistical use of the idea of a central limit theorem. The reader will see that the 
arguments in this section are almost entirely classical (see for example Feller, 
1966) and the main issue is simply to formulate the result. However we give 
complete proofs since we have not been able to trace general forms of these 
results in the literature, and also because the classical proofs must be adapted 
to the vector-valued nature of the summands. 

A natural condition for central limit approximation for normalized partial 
sums of (f-dimcnsional mean-zero finite-variance independent random vectors 
Yi, . . . , y„, . . . is that they should satisfy a variant of Lindeberg's condition; for 
each e > 0, as 71 — > oo so 

-lf]E[||y„,f; ||i;„f >£</>„] ^ 0. (13) 

m—1 

Here we abbreviate 0„ = J2m=i ^ [5 II IP] 5 ^^is parallels the 0„ (o) used in Sec- 
tions 3 and 5 and leads us to consider the normalized sums {Xi + . . .+Xn) / •\/2(/>n- 
(The factor i is awkward in the Euclidean context, but eases details of calcu- 
lations later in the geometric context of Section 5.) Note that (13) corresponds 
exactly to the local condition of Lindeberg type (5) for Xi, X2, .... However 
it should be clear that (13) cannot be sufficient to establish weak convergence 
to normality of (Yi + . . . + y„)/\/2(/)„; consider two-dimensional examples in 
which the sequence Yi , Y2, ... alternates between longer and longer stretches of 
£ (Yk) = (iV(0, 1), 0) versus longer and longer stretches of C (Yk) = (0, 7V(0, 1)). 
So we cannot hope for a central limit theorem (thus the Cramer- Wold device is 
inapplicable); however it is the case that in fact (13) implies a central approxi- 
mation theorem. 

In order to describe the result we first recall that the topology of weak con- 
vergence of probability measures can be metrized using a truncated Wasserstein 
distance 

Wi{^i,iy) = mi{E[lA\\U-V\\]:C{U)=^,C{V)^iy} (14) 

(see for example Villani, 2003, Chapter 7). Moreover by Kantorovich- Rubinstein 
representation (ViUani, 2003, Remark 7.5(i)) we may write 

Widijiy) ~ sup{ J f d{fi — v) f is Lip(l) for distance 1 A ||x — . (15) 

Wc now consider when the law of (Yi + . . . + Yn)/ \/2(j)n draws ever closer to the 
matching (but varying) multivariate normal distribution as n 00: 

Theorem 3 (Lindeberg central approximation theorem for vector- valued ran- 
dom variables). Suppose that Yl, . . . , Y„, ... are independent zero-mean ran- 
dom d-dimensional vectors with finite variance- covariance matrices and that the 
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above variant of Lindeberg's condition (13) is satisfied. Then 

'Yi + ... + ¥„, 



Wi 



0, 



where (pn = X)m=i^ [^ll^mlP] '^'^'^ ^n has the multivariate d-dimensional nor- 
mal distribution of zero mean and variance- covariance matrix Vn, with 



U^VnU 



1 " 



for all vectors u. 



(16) 



Remark 3. Note that the variance- covariance matrix Vn has unit trace. 

Proof. The proof is based heavily on the classic proof of the Feller-Lindeberg 
central limit theorem using characteristic functions. First of all, observe that it 
is a consequence of the variant Lindeberg condition that 



1 

sup — 

m— l,...,n S^n 



^[\\Ym 



0. 



For otherwise we can find a subsequence {ur} and in 1, . . . , Ur such that 
for some positive c > we have E [|lyj7i,JP] > ci^n^ for a-U a-nd if we choose 
e < c then this implies that 

E [|ly,„f ; \\Y.^f > £0„J > E [||r,„J|2 ; ||r,„J|2 > £0„^] > (c-e)0„^ . 

m— 1 

Choosing e < c, this contradicts the variant Lindeberg condition (13). Thus we 
can choose N = N{u) large enough that E i^™)^] < (pn for all rn = 1, . . . , n 
and all n > N{u). 

Using independence, set 



*n(ti) 



E 



cxp I 



(u,Yi 



Y„ 



He 

m— 1 



cxp I 



{ u,Y,n) 



By estimates based on Taylor expansion (Billingsley, 1986, Section 27), 



^ , .{u,Yn) l(w,r™)2 f.{u,Y, 
1 + 2 jTT— cxp I 



2(j)n 2 20„ 



< 



ylA, Y^] 



< max{l,||u||-^} 



20„ 



> 



20, 

(2(/<„)3/2 



(20„)'/' 



I [II K„ 



< 
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Hence for n > N{u), 



iiE[(M,r„ 



20,1 



< 



niax{l,||7.f} 



I, m— 1 m— 1 

f 1 " 

y m—1 



max{l 



(recalling the definition of for the last step, and noting that for n > N{u) 

we know that every f — 2^ is bounded above by 

Now invoke the inequality 

e-p/(i-p) < i_p < e-P, 
valid for < p < 1. Since h ^^^'tl""^ ^ < \ when n > N{u), 



< log Y\ 



< 



E 



2</>„ 



loa 



" / i E[(u,y^)^] ' 

-li I 2 2</.„ 



— 1 



=1 2 20„ \ 



2(^„ 



< 



1 ]E[(ti,y„ 

< — max 



max 

2 m— l,...n 



2 m— l,...n 

E[||y^|p] 

20„ 



2(/)„ 



Accordingly we may use (16) to deduce that if n > N{u) then 
1 



^niu) - exp ( -^U^VnU 



< max{l, X An 



(17) 



[WYrr. 



/£ 1 
An — \l — \ — max 

2 2 m=l,...n 2(j)n 



1 " 

— ^E[||y„,||2; i|y;„!p>£0„] . (is) 



?7l — 1 



Since e can be chosen to be arbitrarily small, and the variant Lindcbcrg con- 
dition (13) implies the other quantities converge to 0, it follows that |5'„(w) — 
exp (— iu^Kiw) I converges to for each fixed u. 

We now convert this relationship between characteristic functions into an 
inequality for the truncated Wasserstein distance between the corresponding 
distributions. To this end we use a Parseval equality (Feller, 1966, XV. 3): 



-*<"'*>«'„(w) = E 



exp l[U, 



Yn 
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We can multiply by the symmetric d-dimensional normal density of variance 
a~^, integrate with respect to u, and rearrange to obtain 



■E 



exp 



2cr2 



(19) 



The right-hand side (viewed as a function of t) is the density of ^^^2^^^" ^ ^' ' 
where Z' has a d-dimensional multivariate normal distribution of variance- 
covariance matrix a^Id, independent of ^^^20'^" " ^•^ definition (14) of 
Wasserstein distance the truncated Wasserstein distance between the distribu- 
tion of and the distribution of + Z' is bounded by 



\z'\ 



< constant x a . 



Given any rj > 0, we can choose a to make this smaller than 77/5. 

Choose Zn to be of d-dimensional multivariate normal distribution with 
variance-covariance matrix Vm independent of Z' . The truncated Wasserstein 
distance between the distributions Zn and Z„ -I- Z' satisfies the same bound of 
77/5. So consider bounds on the truncated Wasserstein distance between the dis- 
tributions of (a) ^ii=i^-|-Z', with density given by (19), and (b) Zn + Z' whose 
density satisfies a similar formula but with the normal characteristic function 
exp (^—^u^VnU) replacing ^'„(m). By the Kantorovich-Rubinstein representation 
(15) of the truncated Wasserstein distance we may consider 



/( 



Yi + ...+Y„ 



+ Z') -E{f{Zn + Z')] 



where / is Lip(l) with respect to the iruncated distance function 1 A — (see 
(15)). Without loss of generality we take /(o) — 0; the Lipschitz condition then 
implies that |/| < 1 (since the truncated distance 1 A — ?;|| is always bounded 
above by 1). Now both ^^^20"^" ~'~ ^' ^^'^ ^" ^ ^' ^^^^ variance-covariance 
matrices with traces bounded above by 1 -I- a^d; therefore once a is fixed we 
may choose a large radius R and deduce by Chebyshev that the distributions of 
both ^^^20"^" ~'~ ^' ^^'^ ~'~ ^' place probability mass of at most 77/5 outside 
the ball centred on o and of radius R, so that 



E 



/( 



Yi- 



I E [f{Z„ + Z') ; II Z„ + Z'll >R]\ < 77/5 . 
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Finally 

E 



E[/(Z„ + Z') ; \\Z„ + Z'\\<R] 
1 



< 



ball(o,H) 
1 



< 



1 



{2n) 



ball(o,_R) 



dt 

|*„(u)-e-5"'^""|e-"'l"l'/2dudi. 



(27r) 



Given (7 and R, the dominated convergence theorem allows us to choose N {not 
depending on u) to make this arbitrarily small for all n > N, hence 



E 



- E [/(Z„ + Z') ; II Z„ + Z'\\<R] < ?//5 , 
for all n > A''. It therefore follows that for n > iV we obtain 

'yi + ... + r„ 



Wi c 



, Zr. 



< 



V, 



and since rj > was arbitrary the theorem follows. 



□ 



The following converse to this result mirrors Feller's converse to Lindeberg's 
theorem. 

Corollary 2 (Feller converse to Lindeberg central approximation theorem) . In 
the situation of Theorem 3, suppose that in place of the above variant of the 
Lindeberg condition (13) it is the case that 



1 



and that 



E [II y„ 



0, 



and 



oo , 



0. 



(20) 
(21) 



Then the Lindeberg condition (13) must be satisfied. 
Proof. As a consequence of (20) and the fact that 0„ increases with n, 

E [lir^lp^ ^"^ 1^ 



lim max 

n— >oo l<?n<n 



— - < lim max — - — ■ — 



)i— >-oo l<m<k 



— - + lim max — 

ri— >-oo k<m (t>„. 
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also tends to zero. For fixed u e R'^, the bounded Lipschitz nature of exp(i(u, x)) 
as a function of x, applied to (21) and the Kantorovieh- Rubinstein characteri- 
zation (15) together imply that 



E 



exp I 



exp 



(u, VnU) 



0. 



Since Vn has unit trace, we can multiply through by exp (^(u, Kiu)), take logs 
and use independence to see that 



1 " 

-{u,VnU) + ^ log] 



exp I 



{ u, Ym) 



0. 



Standard estimates using Taylor expansion show that 



logE 



exp I 



- E 



exp I 



< 



E 



- 1 



exp I 



- 1 



while 



E 



E 



exp I 



max 

l<m<n 



E 



exp I 



- 1 



< max 

l<??i<n 2 



20,1 

E[r„,i|2] ^ 



< 



- 1 



E 



E 



xE 

m— 1 



exp z 



. ( u, y„i) 

/2(/)„ 



20n 1 2 

m— 1 



Thus for fixed u 

n 



m — 1 



1 — exp i 



20n 



(u,Y„. 



max 

l<m<n 8 



- 1 



/20^ 



Taking real parts and splitting the expectation at ||Km 



1 " 
-(u,F„w)-;^E 



1 — cos 



E^ 



1 — cos 



(w, Y„ 



Y„r > ea 



0(1), 



where we must bear in mind that the o(l) term depends on u. The right-hand 
side is bounded above by 



E^E 

m=l 



2xBz^; ||r„i|2>e0„ 



+ 0(1) < - + o(l): 
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while the left-hand side is bounded below by 

n 
m— 1 

n 

= 

rn—l 

by (16); thus 




The variant Lindeberg condition (13) now follows by summing over vectors 
forming an orthonormal basis, and choosing suitably large |ju|l. □ 




5. Central limit theory for empirical Frechet means 

In order to discuss the second-order theory of empirical Frechet means, namely 
central limit theorems, we augment the metric space structure of X by moving to 
the context of a complete and connected Riemannian manifold M of dimension 
d. Let dist(x,2/) be the Riemannian distance between points x, y € M. For any 
X e M, let Cx denote the cut locus of x. Let Exp^ : T^M M be the Exponential 
map from the tangent space T^M to M; observe that Exp~^(y) can be defined 
uniquely for y <^ Cx hy Exp~^{y) = 7'(1), where 7 : [0, 1] — > M is the unique 
minimal geodesic running from x to y. Now let Ilx,y ■ TxM. TyM be the 
parallel transport map along the geodesic 7, and note that Hx]j = Llj^^^:, both 
being defined when x ^ Cy equivalently y ^ Cx- Finally, denote the covariant 
derivative by V: if ?7 is a smooth vectorfield and 7 is a geodesic then the covariant 
derivative of U at 7(0) in the direction 7'(0) is given by 

. ,^n,,,,(o)t/(7(.))-t/(7(0))^ 

^ ^ ' siQ s 

Moreover V-y/(o) depends only on the tangent vector 7'(0), rather than the actual 
curve 7. 

Our discussion concerns a sequence of independent (but not identically dis- 
tributed) random variables Xi , X2 , ■ ■ ■ , taking values in M, for which each 
E [dist(a;, Xi)^] is finite for some (and therefore for all) a;, and which share a 
common Frechet mean o G M. Furthermore we suppose that 

P [Xn e Co] = for 71 ^ 1 . (22) 

For each n we choose £{Xi, . . . , Xn) to be a measurably selected empirical local 
Frechet mean of Xi, . . . , Xn, and we suppose it possible to make these choices 
so that £{Xi, . . . ,Xn) converges to o in probability. (Theorem 2 delineates a 
large class of cases in which this can be done.) 
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For each i > 1 we can define a random vectorficld Yi on M \ Cxi by 

Y,{x) = Exp;i(XO. (23) 

Here we use the definition of Exp^^ on M \ C^; in the cases when Exp^^(Xi) 
is not defined we choose Yi measurably but otherwise arbitrarily from the pre- 
image of under Exp^. In fact it can be shown that (23) defines Yi{x) uniquely 
for almost all x with probability 1; moreover the cut locus condition (22) ensures 
that Yi{o) in particular is almost surely well-defined. 

Since o is a Frechet mean of each Xi , it follows that E [li(o)] = 0; moreover 
the finiteness of E [dist(o, Xi)"^] implies the finiteness of E , which is the 

trace of the variance- covariance matrix of the random vector Yi. Moreover the 
calculus of manifolds shows that 

Yi{x) = -dist(.T,Xi)gradj,dist(a;,Xi) = grad^ (-i dist(x, Xi)^) . (24) 

Indeed, if x £ M\Cxi then covariant differentiation defines a symmetric {d x d) 
tensor Hi{x) = — {yYi){x), acting on vectorfields [/, V by 

{H,U,V){x) = {-VuY,,V){x) = Ress^{^dist{x,X,f){U,V). (25) 

(The sign of Hi is chosen so that li Xi = o then Hi{o) is the identity tensor.) 

As noted above, the assumption that o is a Frechet mean of Xi for all i ^ 1 
implies that ^^(o) = Exp~^(Xi) determines a sequence of independent random 
variables with zero mean on To(M). Then Theorem 3 and Corollary 2 capture 
the conditions under which the normalized sum (^1(0) + . . . + l^(o))/\/20„(o) 
is asymptotically multivariate normal (where (f>n is the aggregate energy func- 
tion as defined in Section 2). Moreover a first-order Taylor expansion argument 
suggests that (under further regularity conditions) the Exponential map of a 
suitable transformation of this normalized sum should approximate the local 
empirical Frechet mean £(Xi, . . . , Xn)', this corresponds to an application of 
Newton's root-finding method. This is indeed the case, and forms the main 
result of this section. However before we turn to this we must first prove a 
preliminary geometric result, required in order to control the effects of the ap- 
proximation. 

We begin by constructing a certain orthonormal frame field ei, . . . , over 
M\Co. Pick ei(o), . . . , 6^(0) to be an orthonormal basis for To(M), and extend 
by parallel transport along minimal geodesies from o over all of M\Co: er{x) = 
Ho.2;er(o); for a; G M\Co. By the properties of geodesic normal coordinates, the 
vectorfields Vg^e^ all vanish at o. 

Lemma 3. For given e > 0, choose p > such that ball(o, p) C M \ Co and 
llVe^Csll < e/d within ball(o, p), for r, s = Set Z^-.i = (Ki,er)er, /or 

some Yi. Then (viewing VZr.i as a symmetric {d x d) tensor) for x G ball(o, p) 
we have 

||n^,oVZ^,,(x)-VZ^,,(o)|| < (l + 2ep) sup ||H:,.,oVy.(x')-Vr,(o)|| 

+ 2e(||K,(o)|| + ||Vy,(o)||p) . (26) 
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Proof. We suppress the dependence on the suffix i for the sake of convenience 
of exposition, and write Zr = Zr,i, Y = Yi. First consider VyZ,. for a general 
smooth vectorfield V. By the calculus of covariant differentiation 

VyZr = Vv{{Y,er)er) = (VyF, 6^)6^ + (y, Vye,.)e,. + (F, er)Vyer . 

Because VySr vanishes at o, 

na;,oVyZr(x) - VvZr{o) = 

+ {Y,Vver){x)er{o) + {Y,er){x)n.^^o^ver{x) . 

The coefficient of (o) in the first term on the right-hand side can be rewritten 
as the evaluation of {Hx.oVvY — VyY, er)&r at o; the other two terms can be 
expanded to achieve 

n:,,oVyZ,.(a;) - VvZ^(o) = (n^.oVi/F - VyF, e,.) (o)e^(o) 

+ (y, e,.)(o)(n,.,oVye^)(o) + (n^.oF - y, e^)(o)(n^,oVye^)(o) . 
To control the size of the matrix M = H^; oVZ^ — VZ^ at o we shall use 



the Frobenius norm is an arbitrary 

vectorfield, hence (evaluating tensor and vectorfields at o throughout) we may 
deduce that 

||n,,oVZ,(x)- vz,(o)|| < 
||n,,oVy(.T) - vy(o)|| + 2 (||n,,oy(a:) - y(o)|| + ||y(o)||) ||Ve,,(a-)|| 

< ||n,,ovy(x)-vy(o)||+2e(||n,,oy(x)-y(o)|| + ||y(o)||) (27) 

so long as X £ ball(o, p). 

We now apply the Mean Value Theorem to observe that 

||n,,oy(a;)-y(o)|| < dist(.T,o) sup ||Vy(2;')|| 

a:'Gball(o,p) 

< dist(a;,o) ( ||Vy(o)|| + sup ||n.^,,oVy(.T') - Vy(o)|| ) (28) 

\ 2;'eball(o,p) / 

and thus (restoring the dependence on the suffix i) we can apply (28) to (27) 
and combine with dist(o,x) < p to deduce the required inequality. □ 

The above lemma allows us to control the errors arising from the approxi- 
mation implicit in the Newton method described above. We can now state and 
prove the main theorem of this section. 
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Theorem 4. Suppose that Xi, X2, . . . are independent non-identically dis- 
tributed random variables taking values in M, such that for all n and all .t G M 
the aggregate energy function <j>n{x) = X]r=i 1 ^ [*^^^*(-^' ^»)^] finite. Sup- 
pose that o is a local Frechet mean of each of the Xi and moreover suppose 
that P [Xi e Co] = for each i. Let = Exp;;\X,)- Let x„ = £{Xi, X„) 
be a measurable choice of local empirical Frechet means such that Xn o in 
probability. Suppose that the following conditions hold: 

1. (t>n{o) is of at least linear growth, so lini inf „_j.oo ^tid£l — Ci > for a 
finite positive constant Ci > 0; 

2. For each sufficiently small p > 0, as n — > 00 so 



1 " 

—-J] sup E[||n,.,„i/,(x')-H,(o)||] 

(Pn{0) .^^ x'ehan{o,p) 

where Hi is as given in (25); 

3. There is a finite constant C2 such that 

n 

limsup— -5]E[||i/,(o)f] < C2 

4. Let Hn be the coordinate- wise expectation 

E[ffi(o) + ... + ff„(o)] 



0, 



20„(o) 



Then the symmetric matrix Hn is asymptotically non-singular; there is a 
positive constant C3 > with \vtns\xpn^oo ll-^n ^11 — ^'i' 
5. Finally we require a condition of Lindeberg type: for each e > 0, as n — > 00 
so 

1 " 

— ^^E[dist(o,X,)^dist(o,X,)' >e</'„(o)] ^ 0. 

Let Zn have the multivariate normal distribution with zero mean and variance- 
covariance matrix H~^VnH~^ , where V„ is the variance- covariance matrix of 

Yi + ... + Y„ rpi 

, . 1 hen as n —> 00 so 



W^i V20n(o) Exp^i(x„),Z„ 



Proof. Begin by representing YJi^^Yi{x) = I]"=i grad^(-i dist(x-, Xj)^) by a 
first-order Taylor series expansion about o: if 7^ is a minimal geodesic begun at 
o and ending at 2; e M \ Co at unit time then 



i=l 

n n 

- ^r,(o)-5^if,(o)7^(0) + A„(x)7i(0), 



i=l 
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where the Mean Value Theorem can be apphed to show that the matrix correc- 
tion term A„(x) can be written as 

d n 

An{x)U = J2J2i^y^(e,.},a^uZr.Alx{0r))~^uZrAo)) 

r=l 1=1 

for Zr.i = {Yi, er)er as defined in Lemma 3, and for suitable < ^i, . . . , < 1- 
Choosing p > given e as in Lemma 3, if a; e ball(o, p) then 

d n 

||A„(x)|| < ^^((l + 2£p) sup i|n,,,oVy,(2;')-Vi^;(o)|| 

r=-[ , = 1 x'eball(o,p) 



+ 2e(||y,(o)|| +p||Vy,(o)||)) . (29) 



Now choose x = x„ = £{Xi, . . . , Xn)- Since 7^ (0) = Exp„^(a;), it follows 
that n^„,o Yh=i Yi{xn) = 0. If Xn = £{Xi, X„) e ball(o, p) then 

n / n \ 

1=1 \i=l / 

Consequently, so long as X]"=i Hi{o) — A„(x„) is invertible, we may write 

x„=£(Xi,...,X„) = Exp„ |^|^f^i7,(o)-A„(a;„)^ f^K,(o)j. (30) 

Use the aggregrate energy function (/)„(x) = ^ [5 dist(Xi, x)^] (defined in 

Section 3) to adjust the above equation into a form hinting at a central limit 
approximation for £ {Xi , . . . , X„ ) : 



V20„(o) X Exp^i(£:(Xi,...,X„)) = 



g.(o) A„(x„) \ ELi^^H .3l^ 



Using our estimates on the Frobenius norm ||A^i(x^^)||, 



20„(o) 



/ , „ ,Er=iSup^'eban(o,p)l|nx',oVr,(.T')-vyj(o) 

20,1(0) 

I , Er=iiiy.(o)ii , . Er=i iiv>-.(o)ii 

-r c — — r "t tp- 



We arc given that 2:„ — > o in probability, so with probability tending to 1 we 
may apply condition 2 of the theorem to the first of these summands, together 
with the Markov inequality, and deduce that 

, „ , Er=iSup^'ebaii(o,p) l|n:.',oVK,(a;') - Vy,(o)|l . 

(1 -I- 2ep) ^ — —r-r-\ ^ m probability. 

20„(o) 
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Application of the Cauchy- Schwartz inequahty to the second sunimand. to- 
gether with the definition of the aggregate energy function, the fact that 1 1 (o) 1 1 = 
dist(o,X,;), and condition 1 of the theorem, shows that 



A similar argument, but using condition 3 of the theorem as well as condition 
1, allows us to deduce that 



^^(o) (^„(o) y 0„(o) V f^nlo) V Ci 

Once again we may use the assumption that — >■ o in probability; it follows 
from this and the Markov inequality that we may choose £ = e„ to decrease to 
zero in such a manner that 

<Pn[0) (Pn(0) 

Accordingly it follows that the matrix error term is negligible: 



20„(o) 



in probability. (32) 



Now consider the behaviour of X^iLi 20' ''(o) • '^^^ control the sum of the 
variances of the components of this matrix: by independence, and the fact that 
variance is always bounded above by second moment, we deduce that the sum 
of variances is bounded above by 

tl 40„(o)2 

which converges to zero by conditions 1 and 3 of the theorem. Accordingly 
o r ^ - Hn in probability. 

Condition 4 of the theorem, together with the negligibility of ^" ^^^^ established 
above in (32), implies that the probability of the following being invcrtible con- 
verges to 1: 

/ Hi{o) _ A„(Xn) \ 
Moreover we may deduce that 

fE^S-trSvl '-^"^ - in probability. (33) 



^ 2(/)„(o) 2(/)„(o) 
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Finally we consider the asymptotic distributional behaviour of 

e:Ii^(o) 

The Lindeberg condition 5 of the theorem translates directly into a condition 
of Lindeberg type on the Yi: since ||li(o)|| = dist(o,Xi), and since E [l'i(o)] = 
as a consequence of o being a local Frechet mean of X; , as n — > oo so 

1 " 

^^E[||y,(o)f;||y,(o)f >e0„(o)] ^ 0. 

Now Theorem 3 shows that 

where Zn has the multivariate d-dimensional normal distribution of zero mean 
and variance-covariance matrix Vn ■ 

The proof of the theorem is now completed by using observation (33), since 
properties of the Wasserstein distance allow us to deduce 

w((\-Ii^2l A„(x„) V' yi(o) + ... + y„(o) 



^ 2(j)n{o) 2(t)„{o) J s/Wn 

from the convergence in probability specified in (33), together with the upper 
bound supplied by condition 4 of the theorem. □ 

We finish by looking at a few special cases. First, if we assume that the X„ are 
actually identically distributed, then (/)„(o) = f E [dist(o, Xi)^] . Accordingly, 
if < E [dist(o, Xi)^] < oo then the conditions 1 and 5 of Theorem 4 hold 
trivially. Moreover, 

= if = E [iJi(o)] /E [dist(o,Xi)2] 

and 

y„ = 1/ = E [$o,xJ / E [dist(o, Xi)2] , 
where ^x,y is the self-adjoint linear operator on ^^(M) defined by 

^x,y:v^ (Exp- 1 (y) , v) Exp" ^ (y) . (34) 

Hence, the following is a direct consequence of Theorem 4. 

Corollary 3. Suppose that Xi, X2, . . . is a sequence of independent and iden- 
tically distributed random variables on M with finite E [dist(a;, Xi)^] . Suppose 
that o is the local Frechet mean of Xi and that ¥[Xi G Co] = 0. Let .t„ = 
£(Xi, . . . , Xn) be a measurable choice of local empirical Frechet means such 
that a;„ — >■ o m probability. Assume that 
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(0 

limE sup ||n^-,oi7i(.T') - i?i(o)|| =0; 



x£hal\{o.,p) 



(a) E [||i7i(o)||^j < oo; 
(Hi) E[iJi(o)] exists. 



Then we have the following weak convergence as n ^ oo: 



where the limit is the multivariate normal distributaion with zero mean and 
variance-covariance matrix H~^¥.[^o,Xi] H^^. 

If there exists a local coordinate chart ipi^) = {xi{x), . . . , Xd{x)) with a 
domain which contains the support of the distribution of Xi , then let us write 
• • • I ^d) and (C", . . . , Q) respectively for the coordinates of Exp^^(Xi) and 
Expjj^^(x„) with respect to the basis {d^i, ■ ■ ■ ,dxj) in ro(M). The following 
result is the version of Corollary 3 in terms of these coordinates. 



where G = {{dxj,dxf.)), = (E[^j^fc]) and is the matrix of the linear 
operator Hi{o) under the coordinate chart ip with 



and with F^^- being the Christoffel symbols for the chosen coordinate chart. 

If the coordinates "0 are normal coordinates centred at o corresponding to 
an orthonormal basis (ei, . . . , Cd) of To(M), then G = I, and (^i, . . . , £^d) and 
(C", . . . ,Q) become the normal coordinates, centred at o, of Xi and Xn respec- 
tively. Moreover, under a normal coordinate system, all the Christoffel symbols 
disappear at the centre o and £_k = — ^Ve^ dist(x, Xi)-^ |_^^^, where Ve^ acts on 
the first variable of dist^ under normal coordinates, Corollary 4 recovers the 
result of Bhattacharya and Patrangenaru (2005) at the Frechet mean o. 

Finally, if M either has constant sectional curvature k or is a Kahler manifold 
of constant holomorphic sectional curvature n then the operator Hi{x) defined 
by (25) can be expressed explicitly. In the former case 



Corollary 4. Write C = iCi, • ■ • , QV ■ In the case of Corollary 3, 





Hi{x) : V I— >■ 



1 -/^(dist(.T,X,)) 

dist(x, j/)2 



$:r,x.(w) + /„(dist(a;,X,))t; 
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and, in the latter, 



H,(x) : V ^ f^[dist[x,Xi})v ^ dist(a; X )'^ ^x^xAv) 

+ dist(x,XO^ 



where 



sin(-y/Ks) K > 

S K = 

sinh(\/— Ks) K < 0, 

C'k(s) = 'S'k(s)/ -v/N^j s-'^d where j is the tensor field of isometries jx of the tan- 
gent spaces Tx{M) such that j^. = —id, ^x,y is defined by (34) and $^ ,y is also de- 
fined by (34) but with Exp^T^ there replaced by ja; oExp~^. Note that the conse- 
quent expression for the operator E [Hi (o)] was obtained in Bhattacharya and Bhattacharya 
(2008) when M has constant curvature and an upper bound has also been given 
in the same paper for general M in term of the bound of its curvature. 
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